Count-Min Sketches for Estimating Password Frequency within Hamming Distance Two
نویسندگان
چکیده
The count-min sketch is a useful data structure for recording and estimating the frequency of string occurrences, such as passwords, in sub-linear space with high accuracy. However, it cannot be used to draw conclusions on groups of strings that are similar, for example close in Hamming distance. This paper introduces a variant of the count-min sketch which allows for estimating counts within a specified Hamming distance of the queried string. This variant can be used to prevent users from choosing popular passwords, like the original sketch, but it also allows for a more efficient method of analysing password statistics.
منابع مشابه
On Estimating Frequency Moments of Data Streams
Space-economical estimation of the pth frequency moments, defined asFp = Pn i=1|fi| , for p > 0, are of interest in estimating all-pairs distances in a large data matrix [14], machine learning, and in data stream computation. Random sketches formed by the inner product of the frequency vector f1, . . . , fn with a suitably chosen random vector were pioneered by Alon, Matias and Szegedy [1], and...
متن کاملPeriodicity and Cyclic Shifts via Linear Sketches
We consider the problem of identifying periodic trends in data streams. We say a signal a ∈ R is p-periodic if ai = ai+p for all i ∈ [n− p]. Recently, Ergün et al. [4] presented a one-pass, O(polylogn)space algorithm for identifying the smallest period of a signal. Their algorithm required a to be presented in the time-series model, i.e., ai is the ith element in the stream. We present a more g...
متن کاملThe Computational Hardness of Estimating Edit Distance
We prove the first nontrivial communication complexity lower bound for the problem of estimating the edit distance (aka Levenshtein distance) between two strings. To the best of our knowledge, this is the first computational setting in which the complexity of estimating the edit distance is provably larger than that of Hamming distance. Our lower bound exhibits a trade-off between approximation...
متن کاملImproved Sketching of Hamming Distance with Error Correcting
We address the problem of sketching the hamming distance of data streams. We present a new notion of sketching technique, Fixable sketches and we show that using such sketch not only we reduce the sketch size, but also restore the differences between the streams. Our contribution: For two streams with hamming distance bounded by k we show a sketch of size O(k logn) with O(logn) processing time ...
متن کاملTowards Low Carbon Similarity Search with Compressed Sketches
Sketches are compact bit string representations of objects. Objects that have the same sketch are stored in the same database bucket. By calculating the hamming distance of the sketches, an estimation of the similarity of their respective objects can be obtained. Objects that are close to each other are expected to have sketches with small hamming distance values. This estimation helps to sched...
متن کامل